3574 results found.
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
Size:
2140000000 tokens Production Status:
Existing-used
Use:
Language Modelling
-
Paper title:CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters
-
Paper track:Long paper/
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Hicham El Boukkouri | English Wikipedia | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
Size:
522000000 tokens Production Status:
Existing-used
Use:
Language Modelling
-
Paper title:CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters
-
Paper track:Long paper/
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Hicham El Boukkouri | PubMed abstracts | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
License:
Size:
505000000 tokens Production Status:
Existing-used
Use:
Language Modelling
-
Paper title:CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters
-
Paper track:Long paper/
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Hicham El Boukkouri | MIMIC-III | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
MIT
Size:
21695 entries Production Status:
Newly created-finished
Use:
Speech Synthesis
-
Paper title:Semi-supervised URL Segmentation with Recurrent Neural Networks Pre-trained on Knowledge Graph Entities
-
Paper track:Long paper/
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Hao Zhang | URL domain names with annotations of internal segments | /N |
Documentation:
None
Speech/Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
Size:
None Production Status:
Existing-used
Use:
Metaphor Recognition
-
Paper title:An analysis of language models for metaphor recognition
-
Paper track:Long paper/
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Katja Markert | VU Amsterdam Metaphor Corpus | /N |
Documentation:
None
Written
Ontology,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
CreativeCommons
Size:
3800000 concepts Production Status:
Newly created-in progress
Use:
Knowledge Discovery/Representation
-
Paper title:I Know What You Asked: Graph Path Learning using AMR for Commonsense Reasoning
-
Paper track:Long paper/
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Jungwoo Lim | ConceptNet 5 | /N |
Documentation:
http://conceptnet5.media.mit.edu
Not Applicable
Tagger/Parser,
Language Type:
Multilingual
Languages:
Czech English German Spanish Turkish
Availability:
Freely Available
License:
Size:
None Production Status:
Existing-used
Use:
Parsing and Tagging
-
Paper title:Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation
-
Paper track:Long paper/
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Víctor M. Sánchez-Cartagena | Stanford's NLP Parser | /N |
Documentation:
None
Written
Corpus,
Language Type:
Bilingual
Languages:
English Turkish
Availability:
Freely Available
License:
Size:
207678 sentences Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation
-
Paper track:Long paper/
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Víctor M. Sánchez-Cartagena | SETIMES | /N |
Documentation:
None
Speech/Written
Metadata,
Language Type:
Bilingual
Languages:
English Spanish
Availability:
Freely Available
License:
Creative Commons
Size:
None Production Status:
Existing-updated
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:The Two Shades of Dubbing in Neural Machine Translation
-
Paper track:Short paper/
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Alina Karakanta | Heroes-on-off | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Access is restricted to SIGMathLing members under the SIGMathLing Non-Disclosure-Agreement as for most arXiv articles, the right of distribution was only given (or assumed) to arXiv itself. Membership is free and granted on the honor system.
License:
SIGMathLing Non-Disclosure-Agreement (research-only use)
Size:
10,555,689 paragraphs OtherProduction Status:
Newly created-finished
Use:
Document Classification, Text categorisation
-
Paper title:Scientific Statement Classification over arXiv.org
-
Paper track:Written/poster presentation with demo
-
Paper status:Accept Poster+Demo
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Deyan Ginev | Statement Classification Dataset, arXMLiv 08.2018 | /N |
Documentation:
None




